anuary 29
A Nonparametric Bayes Approach to Online Activity Prediction
Beraha, Mario, Masoero, Lorenzo, Favaro, Stefano, Richardson, Thomas S.
Examples include the number of users who will install a software update, the number of customers who will use a new feature on a website or who will participate in an A/B test. Whether the focus is on estimating the number of individuals initiating an action or predicting the temporal span needed to attain a desired user participation threshold, accurate predictive models play a central role in decision making, resource allocation, and enhancing user experiences. See, e.g., Kohavi et al. (2007) and Bakshy et al. (2014) for further details on online experiments. While participation data can be formally treated as a time series, the problem of forecasting user participation does not lend itself to time series models (see Richardson et al., 2022, and the references therein). Moreover, intricate dynamics that underlie user engagement patterns. Conventional models often assume that initiation times are identically distributed, ignoring the diverse behaviors and preferences exhibited by individuals. In reality, users demonstrate varying propensities to engage, leading to a multitude of initiation timelines. Recognizing this complexity, Richardson et al. (2022) recently proposed a Bayesian model for the users' initiation times, which allows different behaviors to be captured, while simultaneously borrowing strength as is typical in hierarchical Bayesian models.
Copula-based conformal prediction for Multi-Target Regression
Messoudi, Soundouss, Destercke, Sébastien, Rousseau, Sylvain
The most common supervised task in machine learning is to learn a single-task, single-output prediction model. However, such a setting can be ill-adapted to some problems and applications. On the one hand, producing a single output can be undesirable when data is scarce and when producing reliable, possibly set-valued predictions is important (for instance in the medical domain where examples are very hard to collect for specific targets, and where predictions are used for critical decisions). Such an issue can be solved by using conformal prediction approaches [1]. It was initially proposed as a transductive online learning approach to provide set predictions (in the classification case) or interval predictions (in the case of regression) with a statistical guarantee depending on the probability of error tolerated by the user, but was then extended to handle inductive processes [2]. On the other hand, there are many situations where there are multiple, possibly correlated output variables to predict at once, and it is then natural to try to leverage such correlations to improve predictions. Such learning tasks are commonly called Multi-task in the literature [3]. Most research work on conformal prediction for multi-task learning focuses on the problem of multi-label prediction [4, 5], where each task is a binary classification one. Conformal prediction for multi-target regression has been less explored, with only a few studies dealing with it: Kuleshov et al. [6] provide a theoretical framework to use conformal predictors within manifold (e.g., to provide a mono-dimensional embedding of the multi-variate output), while Neeven and Smirnov [7] use a straightforward multi-target extension of a conformal single-output k-nearest neighbor regressor [8] to provide weather forecasts.
RBM-Flow and D-Flow: Invertible Flows with Discrete Energy Base Spaces
O'Connor, Daniel, Vinci, Walter
Efficient sampling of complex data distributions can be achieved using trained invertible flows (IF), where the model distribution is generated by pushing a simple base distribution through multiple non-linear bijective transformations. However, the iterative nature of the transformations in IFs can limit the approximation to the target distribution. In this paper we seek to mitigate this by implementing RBM-Flow, an IF model whose base distribution is a Restricted Boltzmann Machine (RBM) with a continuous smoothing applied. We show that by using RBM-Flow we are able to improve the quality of samples generated, quantified by the Inception Scores (IS) and Frechet Inception Distance (FID), over baseline models with the same IF transformations, but with less expressive base distributions. Furthermore, we also obtain D-Flow, an IF model with uncorrelated discrete latent variables. We show that D-Flow achieves similar likelihoods and FID/IS scores to those of a typical IF with Gaussian base variables, but with the additional benefit that global features are meaningfully encoded as discrete labels in the latent space.
OPFython: A Python-Inspired Optimum-Path Forest Classifier
de Rosa, Gustavo Henrique, Papa, João Paulo, Falcão, Alexandre Xavier
Machine learning techniques have been paramount throughout the last years, being applied in a wide range of tasks, such as classification, object recognition, person identification, image segmentation, among others. Nevertheless, conventional classification algorithms, e.g., Logistic Regression, Decision Trees, Bayesian classifiers, might lack complexity and diversity, not being suitable when dealing with real-world data. A recent graph-inspired classifier, known as the Optimum-Path Forest, has proven to be a state-of-the-art technique, comparable to Support Vector Machines and even surpassing it in some tasks. In this paper, we propose a Python-based Optimum-Path Forest framework, denoted as OPFython, where all of its functions and classes are based upon the original C language implementation. Additionally, as OPFython is a Python-based library, it provides a more friendly environment and a faster prototyping workspace than the C language.